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, We extend Tropp's analysis of Orthogonal Matching Pursuit (OMP) using the Exact Recovery 

psj i Condition (ERC) [1] to a first exact recovery analysis of Orthogonal Least Squares (OLS). We show that 



when ERC is met, OLS is guaranteed to exactly recover the unknown support. Moreover, we provide 
a closer look at the analysis of both OMP and OLS when ERC is not fulfilled. We show that there 
exist dictionaries for which some subsets are never recovered with OMP. This phenomenon, which also 
appears with £i minimization, does not occur for OLS. Finally, numerical experiments based on our 
theoretical analysis show that none of the considered algorithms is uniformly better than the other. 
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L Introduction 

Classical greedy subset selection algorithms include, by increasing order of complexity: Matching 
Pursuit (MP) [2], Orthogonal Matching Pursuit (OMP) [3] and Orthogonal Least Squares (OLS) [4,5]. 
OLS is indeed relatively expensive in comparison with OMP since OMP performs one linear inversion 
per iteration whereas OLS performs as many linear inversions as there are non-active atoms. We refer 
the reader to the technical report [6] for a comprehensive review on the difference between OMP and 
OLS. 

OLS is referred to using many other names in the literature. It is known as forward selection in 
statistical regression [7] and as the greedy algorithm [5], Order Recursive Matching Pursuit (ORMP) [8] 
and Optimized Orthogonal Matching Pursuit (OOMP) [9] in the signal processing Literature, all these 
algorithms being actually the same. It is worth noticing that the above-mentioned algorithms were 
introduced by following either an optimization [4,7] or an orthogonal projection methodology [5], or 
both [8,9]. In the optimization viewpoint, the atom yielding the largest decrease of the approximation 
error is selected. This leads to a greedy sub-optimal algorithm dedicated to the minimization of the 
approximation error. In the orthogonal projection viewpoint, the atom selection rule is defined as an 
extension of the OMP rule: the data vector and the dictionary atoms are being projected onto the subspace 
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that is orthogonal to the span of the active atoms, and the normalized projected atom having the largest 
inner product with the data residual is selected. As the number of active atoms increases by one at any 
iteration, the projections are done on a subspace whose dimension is decreasing. 

A. Main objective of the paper 

Our primary goal is to address the OLS exact recovery analysis from noise-free data and to investigate 
the connection between the OMP and OLS exact recovery conditions. In the literature, much attention 
was paid to the exact recovery analysis of sparse algorithms that are faster than OLS, e.g., thresholding 
algorithms and simpler greedy algorithms like OMP [10]. But to the best of our knowledge, no exact 
recovery result is available for OLS. In their recent paper [11], Davies and Eldar mention this issue and 
state that the relation between OMP and OLS remains unclear. 

B. Existing results for OMP 

Our starting point is the existing analysis of OMP whose structure is somewhat close to OLS. Exact 
recovery studies rely on alternate methodologies. 

Tropp's Exact Recovery Condition (ERC) [1] is a necessary and sufficient condition of exact recovery 
in a worst case analysis. On the one hand, if a subset of k atoms satisfies the ERC, then it can be 
recovered from any linear combination of the k atoms in at most k steps. On the other hand, when the 
ERC is not satisfied, one can generate a counterexample {i.e., a specific combination of the k atoms) 
for which OMP fails, i.e., OMP selects a wrong atom during its first k iterations. Specifically, the atom 
selected in the first iteration is a wrong one. 

Davenport and Wakin [12] used another analysis to show that OMP yields exact support recovery 
under certain Restricted Isometry Property (RIP) assumptions. Actually, the ERC necessarily holds when 
Davenport and Wakin's condition is fulfilled since ERC is a necessary and sufficient condition of exact 
recovery. 

C. Generalization of Tropp 's condition 

We propose to extend Tropp's condition to OLS. We remark that the very first iteration of OLS is 
identical to that of OMP: the first selected atom is the one whose inner product with the input vector is 
maximal. Therefore, when ERC does not hold, the counterexample for which the first iteration of OMP 
fails also yields a failure of the first iteration of OLS. Hence one cannot expect to derive an exact recovery 
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condition for OLS which is weaker than ERC at the first iteration. We show that the ERC indeed ensures 
the success of OLS. 

We further address the case where ERC does not hold, i.e., the first iteration of OMP/OLsQ is 
not guaranteed to succeed but nevertheless succeeds "by chance". We derive weaker conditions which 
guarantee that an exact support recovery occurs in the subsequent iterations. These extended recovery 
conditions coincide with ERC at the first iteration but differ from it from the second iteration. 

In summary, our main results state that: 

• Tropp's ERC is a sufficient condition of exact recovery for OLS (Theorem |2l). 

• When the early iterations of Oxx have all succeeded, we derive two sufficient conditions, named 
ERC-OMP and ERC-OLS, for the recovery of the remaining true atoms (Theorem O. 

• Moreover, we show that our conditions are, in some sense, necessary (Theorems |4] and [S]). 

D. Organization of the paper 

In Section |lll we recall the principle of OMP and OLS and their interpretation in terms of orthogonal 
projections. Then, we properly define the notions of successful support recovery and support recovery 
failure. Section |lll] is dedicated to the analysis of OMP and OLS at any iteration where the most technical 
developments and proofs are omitted for readability reasons. These important elements can be found 
in the appendix section [A] In Section |IVl we show using Monte Carlo simulations that there is no 
systematic implication between the ERC-OMP and ERC-OLS conditions but we exhibit some elements 
of discrimination between OMP and OLS. 

II. Notations and prerequisites 

The following notations will be used in this paper. ( . , . ) refers to the inner product between vectors, 
and II . II and || . ||i stand for the Euclidean norm and the £i norm, respectively. denotes the pseudo- 
inverse of a matrix. For a full rank and undercomplete matrix, we have Xt = {X^X)-^X* where .* 
stands for the matrix transposition. When X is overcomplete, spark(X) denotes the minimum number 
of columns from X that are linearly dependent [13]. The letter Q denotes some subset of the column 
indices, and Xq is the submatrix of X gathering the columns indexed by Q. Finally, Pq = XqX^q 
and Pq = I — Pq denote the orthogonal projection operators on span(XQ) and span(Xg)-'-, where 
span(X) stands for the column span of X, span(X)-'- is the orthogonal complement of span(X) and 
/ is the identity matrix whose dimension is equal to the number of rows in X. 

'in the rest of the paper, we will use the notation Oxx when referring to properties that apply to both OMP and OLS. 
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A. Subset selection 

Let A = [ai, . . . , a„] denote the dictionary gathering unitary atoms aj G R"*. A is a matrix of size 
m X n. Assuming that the atoms are unitary is actually not necessary for OLS as the behavior of OLS is 
unchanged whether the atoms are normalized or not [6]. On the contrary, OMP is highly sensitive to the 
normalization of atoms since its selection rule involves the inner products between the current residual 
and the non-selected atoms. 

We consider a subset of {l,...,n} of cardinality k = Card [Q*] < min(m,n) and study the 
behavior of OMP and OLS for all inputs y G span(Ag*), i.e., for any combination y = AQ*t where 
the submatrix Aq<. is of size m x k and the weight vector t G R^. The k atoms {aj, i G Q*} indexed 
by Q* will be referred to as the "true" atoms while for the remaining ("wrong") atoms {aj, i ^ Q*}, 
we will use the generic notation abad- The forward greedy algorithms considered in this paper start from 
the empty support and select a new atom per iteration. At intermediate iterations j G {0, . . . , A; — 1}, we 
denote by Q the current support (with Card [Q] = j). 

Throughout the paper, we make the general assumption that Aq* is full rank. It is important to 
mention that this assumption does not guarantee that the representation y = AQ*t is unique, i.e., there 
may be another A;-term representation y = AQ>t' where Aq> includes some wrong atoms a^ad- The 
stronger assumption spark(A) > 2fc is a necessary and sufficient condition for uniqueness of any k- 
term representation [13]. Therefore, when spark(A) > 2k, the selection of a wrong atom by a greedy 
algorithm disables a /c-term representation of y in k steps [1]. We make the weak assumption that Aq* 
is full rank because it is sufficient to elaborate our exact recovery conditions under which no wrong atom 
is selected in the first k iterations. 

B. OMP and OLS algorithms 

The common feature between OMP and OLS is that they both perform an orthogonal projection 
whenever the support Q is updated: the data approximation reads Pgt/ and the residual error is defined 
by 

rQ = y- Pay = Pqv- 

Let us now recall how the selection rule of OLS differs from that of OMP. 

At each iteration of OLS, the atom a£ yielding the minimum least-square error ||T'gu{€}P is selected: 

G argmin||rgu{^}f 
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and n — Card [Q] least-square problems are being solved to compute H^guij} |p for all i ^ Q (p) [4]. On 
the contrary, OMP adopts the simpler rule 

^OMP g arg max I (rQ,ai) I 

to select the new atom and then solves only one least-square problem to compute Hr^gulfjlP [6]- 
Depending on the application, the OMP and OLS stopping rules can involve a maximum number of 
atoms and/or a residual threshold. Note that when the data are noise-free (they read as y = Ag*f) and 
no wrong atom is selected, the squared error Hr-gp is equal to after at most k iterations. Therefore, 
we will consider no more than k iterations in the following. 

C. Geometric interpretation 

A geometric interpretation in terms of orthogonal projections will be useful for deriving recovery 
conditions. It is essentially inspired by the technical report of Blumensath and Davies [6] and by Davenport 
and Wakin's analysis of OMP under the RIP assumption [12]. 

We introduce the notation dj = Pga^ for the projected atoms onto span(Ag)-'- where for simplicity, 
the dependence upon Q is omitted. When there is a risk of confusion, we will use dp instead of dj. 
Notice that d/ = if and only if ai G span(Ag). In particular, dj = for i G Q. Finally, we define the 
normalized vectors 

di/||di|| if di / 0, 
otherwise. 

Again, we will use bf- when there is a risk of confusion. 

We now emphasize that the projected atoms dj (or hi) play a central role in the analysis of both 
OMP and OLS. Because the residual rg = PqU lays in span(Ag)-'-, (rg, a/) = (rg, dj) and the OMP 
selection rule rereads: 

e argmax|(rg,di)| (1) 

whereas for OLS, minimizing Hr'gufjjlP with respect to i ^ Q is equivalent to maximizing ||rg|p — 
lksu{j}lP = (^Qj^i)^ (see ^-g-, [9] for a complete calculation): 

^°LS e arg max I (rg,&,) I . (2) 

^Our purpose is not to focus on the OLS implementation. However, let us just mention that in the typical implementation, 
the least-square problems are solved recursively using the Gram Schmidt orthonormalization procedure [4]. 
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We notice that ([T]) and ^ only rely on the vectors rg and dt belonging to the subspace span(AQ)-'-. 
OMP maximizes the inner product \{rQ,di)\ whereas OLS minimizes the angle between vq and di 
(this difference was already stressed and graphically illustrated in [6]). When the dictionary is close to 
orthogonal, e.g., for dictionaries satisfying the RIP assumption, this does not make a strong difference 
since ||aj|| is close to 1 for all atoms [12]. But in the general case, ||ai|| may have wider variations 
between and 1 leading to substantial differences between the behavior of OMP and OLS. 

D. Definition of successfiil recovery and failure 

Throughout the paper, we will use the unifying notation 




di for OMP, 
bi for OLS 



for statements that are common to OMP and OLS. 

We first stress that in special cases where the Oxx selection rule yields multiple solutions including a 
wrong atom, i.e., when 

max |(rQ,Ci)| = max I (rQ,Ci) I, (3) 

ieQ*\Q i^Q* 

we consider that Oxx automatically makes the wrong decision. Tropp used this convention for OMP and 
showed that in the limit case where the upper bound on his ERC condition (see Section ITlI-AI ) is reached, 
the limit situation Q occurs, hence a wrong atom is selected at the first iteration [1]. 

Let us now properly define the notions of successful support recovery and support recovery failure. 

Definition 1 [Successful recovery] Assume that Aq* is full rank. Oxx with y G span(Ag*) as input 
succeeds if and only if there exists j ^ Card [Q*] such that all first j iterations of Oxx select atoms in 
Q* and the residual tq is equal to after the j-th iteration. 

In other words, when a successful recovery occurs, the subset yielded by Oxx satisfies Qy C Q C Q* 
where Qy is the "sparsest subset", i.e., the subset of Q* corresponding to the nonzero weights U's in the 
decomposition y = AqA. When all tfs are nonzero, Qy identifies with Q* and a successful recovery 
coincides with the exact recovery of Q* in k iterations. 

The word "failure" refers to the exact contrary of successful recovery. 

Definition 2 [Failure] Assume that Aq* is full rank. Oxx with y £ span( Aq.) as input fails when at 
least one wrong atom is selected during the first k iterations. In particular, Oxx fails when (O occurs 
with Tq ^ 0. 
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III. Overview of our recovery analysis of OMP and OLS 

In this section, we present our main concepts and results regarding the sparse recovery guarantees with 
OLS, their connection with the existing OMP results and the new results regarding OMP For clarity 
reasons, we place the technical analysis including most of the proofs in the main appendix section [A] 

Let us first recall Tropp's ERC condition for OMP which is our starting point. 

A. Tropp 's ERC condition for OMP 

Theorem 1 [ERC is a sufficient recovery condition for OMP and a necessary condition at the first 
iteration [l,Theorems 3.1 and 3.10]] If Ag* is full rank and 

i^Q*(abad) = max||A^o*abad||i < 1, ERC(A, Q*) 

then OMP succeeds for any input y E span(Ag*). Furthermore, when ERC(A, Q*) does not hold, there 
exists y G span(AQ* )/or which some abad i^ selected at the first iteration of OMP. When spark(A) > 2k, 
this implies that OMP cannot recover the (unique) k-term representation of y. 

Note that ERC(A, Q*) only involves the dictionary atoms since it results from a worst case analysis: if 
ERC(A, Q*) holds, then a successful recovery occurs with y = Aq** whatever t G IRf'. 

B. Main theorem 

A theorem similar to Theorem [T] applies to OLS. This is our main contribution. 

Theorem 2 [ERC is a sufficient recovery condition for OLS and a necessary condition at the first 
iteration] If Aq* is full rank and ERC(A, Q*) holds, then OLS succeeds for any input y £ span(Ag. ). 
Furthermore, when ERC(A, Q*) does not hold, there exists y £ span( Ag*) for which some abad i^ 
selected at the first iteration of OLS. When spark(A) > 2k, this implies that OLS cannot recover the 
(unique) k-term representation of y. 

The necessary condition result is obvious since the very first iteration of OLS coincides with that of 
OMP and ERC is a necessary condition for OMP. The core of our contribution is the proof that ERC is 
a sufficient condition for the exact recovery with OLS. We now introduce the main concepts on which 
our OLS analysis relies. They also lead to a more precise analysis of OMP from the second iteration. 
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C. Main concepts 

Let us keep in mind that ERC is a worst case necessary condition at the first iteration. But what 
happens when the ERC is not met but nevertheless, the first j iterations of Oxx select j true atoms 
(j < k)l Can we characterize the exact recovery conditions at the (j + l)-th iteration? We will answer 
to these questions and provide: 

1) an extension of the ERC condition to the j-th iteration of OMP; 

2) a new necessary and sufficient condition dedicated to the j-th iteration of OLS. 

This will allow us to prove Theorem |2] as a special case of the latter condition when j = 0. 

In the following two paragraphs, we introduce useful notations for a single wrong atom abad and then 
define our new exact recovery conditions by considering all the wrong atoms together. Q plays the role 
of the subset found by Oxx after the first j iterations. 

1} Notations related to a single wrong atom: For Q C Q* , we define: 

i^§^l"(abad) = |(^t«bad)(i)| (4) 

Fg^l(abad)^ E #^l(^t«bad)«| (5) 

when dbad / and Fgf'^(abad) = when Obad = (we recall that = P^ai and dbad = ^'gCtbad 
depend on Q). Up to some manipulations on orthogonal projections, (01) and ^ can be rewritten as 
follows. 

Lemma 1 Assume that Aq* is fi^ll rank. For Q C Q*, Fg?g'(abad) i^nd Fg^g(abad) olso read 

-^Q*^s("bad) = ||^g.\Qabad||l (6) 
-^Q%s('^bad) = ||-^g»^gbbad||l (7) 

where the matrices Aq*\^q = {dj, i S Q*\Q} and Bq^\q = {dj, i G Q*\Q} of size m x {k — j) are 
full rank. 

Lemma [U is proved in Appendix |B] 
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2) ERC-Oxx conditions for the whole dictionary: We define four binary conditions by considering all 
the wrong atoms together: 

max FOMP(ab,d) < 1 ERC-OMP(A, Q^ Q) 



flbad 

7.0LS 



max F'^t^Q (ttbad ) < 1 ERC-OLS( A, , Q) 



O-bad 



max maxF^f^A(abad) < 1 ERC-OMP(A, Q*, j) 

Card[Q]=j 

max maxFg?|(abad) < 1 ERC-OLS(A, Q*, j) 

QCQ* Obad ' 

Card[Q]=j 

We will use the common notations i^g?Q(abad)> ERC-Oxx(A, Q*, Q) and ERC-Oxx(A, Q* ,j) for state- 
ments that are common to both OMP and OLS. 

Remark 1 -Fg?0^(abad) (^nd -FQ?0(abad) both reread -FQ*(abad) = ||^Q*'^bad||]^ since a® reduces to 
ai which is unitary. Thus, ERC-Oxx(A, Q*, 0j and ERC-Oxx(A, Q*, 0) all identify with ERC(A, Q*). 

D. Sufficient conditions of exact recovery at any iteration 

The sufficient conditions of Theorems [T] and |2] reread as the special case of the following theorem 
where Q = 0. 

Theorem 3 [Sufficient recovery condition for Oxx after j successful iterations] Assume that Aq* is 
full rank. If Oxx with y £ span( Aq.) as input selects Q C. Q* and ERC-Oxx( A, Q*, Q) holds, then Oxx 
succeeds in the sense of Definition [7] 

The following corollary is a straightforward adaptation of Theorem |3] to ERC-Oxx(A, Q*,j). 

Corollary 1 Assume that Aq* is full rank. If Oxx with y G span( Aq* ) as input selects true atoms 
during the first j ^ iterations and ERC-Oxx(A, Q*,j) holds, then Oxx succeeds. 

The key element which enables us to establish Theorem |3] is a recursive relation linking Fgf g(abad) 
with -Fgf Q/(ttbad) when Q is increased by one element of Q*\Q, resulting in subset Q'. This leads 
to the main technical novelty of the paper, stated in Lemma |7] (see Appendix lA-AI ). From the thorough 
analysis of this recursive relation, we elaborate the following lemma which guarantees the monotonic 
decrease of Fgf g(abad) when Q C Q* is growing. 
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Lemma 2 Assume that Aq* is full rank. Let Q! ^ Q*. For any abad. 




(8) 



i^S.,s(abad) < 1 ^ Fs*,Q'(abad) ^ Fgi^Qlabad) 



(9) 



We refer the reader to Appendix lA-AI for the proof of Lemmas |7] and [2l and then Theorem [3] 

E. Necessary conditions of exact recovery at any iteration 

We recall that ERC is a worst case necessary condition guaranteed for the selection of a true atom 
by OMP and OLS in their very first iteration. We provide extended results stating that ERC-Oxx are 
worst case necessary conditions when the first iterations of Oxx have succeeded, up to a "reachability 
assumption" defined hereafter, for OMP. 

Definitioii 3 [Reachability] Assume that Aq is full rank. Q is reachable if and only if there exists an 
input y = Agt where ti ^ Ofor all i, for which Oxx recovers Q in Card [Q] iterations. Specifically, the 
selection rule ([T])-© always yields a unique maximum. 

We start with the OLS condition which is simpler. 

1) OLS necessary condition: 

Theorem 4 [Necessary condition for OLS after j iterations] Let Q Q* be a subset of cardinality j. 
Assume that Aq* is full rank and spark(A) ^ ( j + 2). If ERC-OLS(A, Q*, Q) does not hold, then there 
exists y € span(AQ*) for which OLS selects Q in the first j iterations and then a wrong atom ttbad 
the (j + l)-th iteration. 

Theorem |4] is proved in Appendix lA-BI An obvious corollary can be obtained by replacing Q with j 
akin to the derivation of Corollary [T] from Theorem |3] From now on, such obvious corollaries will not 
be explicitly stated. 

2) Reachability issues: The reader may have noticed that Theorem |4] implies that Q can be reached 
by OLS at least for some input y G span(Ag*). In Appendix lA-BI we establish a stronger result: 

Lemma 3 (Reachability by OLS) Any subset Q with Card [Q] ^ spark(A) - 2 can be reached by 
OLS with some input y G span( Ag). 
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The assumption Card [Q] ^ spark(A) — 2 enables us to guarantee that the OLS selection rule (ID always 
yields a unique maximum (see Appendix IA-B|) . 

Perhaps surprisingly, the result of Lemma [3] does not remain valid for OMP although it holds under 
certain RIP assumptions [12, Theorem 4.1]. As shown in Example [T] hereafter, there are counterexamples 
where Q cannot be reached by OMP not only for y G span(AQ) but also for any y G ]RJ^. The same 
somewhat surprising phenomenon of non-reachability also occurs with £i minimization, associated to 
certain fc-faces of the ii ball in R" whose projection through A yields interior faces. This result is a 
direct consequence of the Null Space Property [14]. 



Example 1 Consider the simple dictionary 

cos 9i cos ^1 

— sin 6i sin 9i cos 02 cos 82 

sin 62 — sin 62 
with Q = {1,2}. Set O2 to an arbitrary value in (0,7r/2). When ^1 7^ is close enough to 0, OMP can 

never reach Q in two iterations (specifically, when y G IR'^ is proportional to neither ai nor a2, 013 or 
04 is selected in the first two iterations). 



This result is proved in Section IA-B3I Although in Example \T\ a subset of cardinality 2 can never be 
reached, we remark that for undercomplete dictionaries, any subset of cardinahty 2 can be reached for 
some y G MJ^. 

3) OMP necessary conditions including reachability assumptions: Our necessary condition for OMP 
is somewhat tricky because we must assume that Q is reachable by OMP using some input in span(Ag). 



Theorem 5 [Necessary condition for OMP after j iterations] Assume that Aq* is full rank and Q C Q* 
is reachable. If ERC-OMP(A, Q*, Q) does not hold, then there exists y G span(Ag*) for which OMP 
selects Q in the first j iterations and then a wrong atom abad 'n the (j + l)-th iteration. 

Theorem [5] is proved together with Theorem |4] in Appendix lA-BI Setting aside the reachability issues, 
the principle of the proof is common to OMP and OLS. We proceed the proof of the sufficient condition 
(Theorem |3]l backwards, as was done in [1, Theorem 3.10] in the case Q = 0. 

In the special case where j = I, Theorem [5] simplifies to a corollary similar to the OLS necessary 
condition (Theorem ^ because any subset Q of cardinality 1 is obviously reachable using the atom 
indexed by Q as input vector. 
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Corollary 2 [Necessary condition for OMP in the second iteration] Assume that Ag* is full rank and 
let i e Q*. If ERC-OMP(A, Q*,{i}) does not hold, then there exists y G span(AQ*) for which OMP 
selects Ui and then a wrong atom Obad the first two iterations. 

4) Discrimination between OMP and OLS at the k-th iteration: We provide an element of discrimi- 
nation between OMP and OLS when their first k — 1 iterations have selected true atoms, so that there 
is one remaining true atom which has not been chosen. Let us first observe that in Example [T] OMP is 
not guaranteed to select the second true atom when ai or 02 has already been chosen. This is actually 
a major difference with OLS. 

Theorem 6 [Guaranteed success of the k-th iteration of OLS] If [Ag*, abad] is fi^ll rank for any abad. 
then ERC-OLS(A, Q*,k — 1) is true. Thus, if the first k — 1 iterations of OLS select true atoms, the last 
true atom is necessarily selected in the k-th iteration. 

This result is straightforward from the definition of OLS in the optimization viewpoint: "OLS selects 
the new atom yielding the least possible residual" and the remark that in the fc-th iteration, the last true 
atom yields a zero valued residual. Another (analytical) proof of Theorem [6j given below, is based on the 
definition of ERC-OLS(A, Q*,k — 1). It will enable us to understand why the statement of Theorem |6] 
is not vaUd for OMP. 

Proof: Assume that OLS yields a subset Q C Q* after k — 1 iterations. Let aiast denote the last 
true atom so that Ag. = [Ag, aiast] up to some permutation of columns. Since -Bg*\g reduces to 
and because b^^^ is unitary, the pseudo-inverse Bq,-^q takes the form [bj^st]*- Finally, (|7]l simplifies to: 

i^§^:|(abad) = KbL,&b^ad>Kl (10) 

since both vectors in the inner product are either unitary or equal to 0. Apply Lemma [8] in Appendix IB] 
since [Ag*,abad] is full rank, [^last'^Sid] ^^^^ rank, thus (ITOl ) is a strict inequality. ■ 
Similar to the calculation in the proof above, we rewrite Fg?g'(abad) defined in Q: 

77OMP/ N [("'last' "'bad) I /i-ix 

i^g*,g(abadj = — q ^ — • (11) 

However, we cannot ensure that Fg.^g (abad) ^ 1 since a,p are not unitary vectors. 

To further distinguish OMP and OLS, we elaborate a "bad recovery condition" under which OMP is 
guaranteed to fail in the sense that Q* is not reachable. 
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Theorem 7 [Sufficient condition for bad recovery with OMP] Assume that Aq* is full rank. If 



mm 

QCQ* 
Card[S]=A,'-l 



maxFO^I(abad) 

Clbad 



^ 1, BRC-OMP(A, Q*) 



then Q* cannot be reached by OMP using any input in span(AQ*). 

Specifically, BRC-OMP(A, Q*) guarantees that a wrong selection occurs at the /c-th iteration when the 
previous iterations have succeeded. 

Proof: Assume that for some y G span(AQ*), the first k — 1 iterations of OMP succeed, i.e., they 
select Q C Q* of cardinality k — 1. Let aiast denote the last true atom (Ag. = [Ag, aiast] up to some 
permutation of columns). The residual rg yielded by OMP after k — \ iterations is obviously proportional 

to "last- 

BRC-OMP(A, Q*) implies that ERC-OMP(A, Q*, Q) is false, thus there exists abad i span( Ag) such 
that FO?;i|(abad) ^ 1. According to GD, \{a^,,,,a^J\ ^ \\af^,f thus Krg,aS,)| ^ KrQ,a2 J|. 
We conclude that aiast cannot be chosen in the /c-th iteration of OMP. ■ 

Although BRC-OMP(A, Q*) may appear restrictive (as a minimum is involved in the left-hand side), 
we will see in Section |IV] that it may frequently be met, even when the atoms of A are not strongly 
correlated. 



IV. Empirical comparison of the OMP and OLS exact recovery conditions 

The purpose of this section is to test whether there is some systematic implication between the 
conditions ERC-OMP(A, Q*, Q) and ERC-OLS(A, Q*, Q), and between ERC-OMP(A, Q*,i) and ERC- 
OLS(A, Q*,i). We set j = Card [Q] = 1. Additionally, we will emphasize the distinction between OMP 
and OLS by evaluating the bad recovery condition for OMP. These empirical comparisons involve Matlab 
simulations with random dictionaries. 



A. Comparison of the ERC-Oxx conditions 

We compare ERC-OMP(A, Q*, Q) and ERC-OLS(A, Q*, Q) for a common dictionary and a given 
pair of subsets where Q C Q* is of cardinality 1. As the recovery conditions take the form "for all 
«bad> ^g?g(ctbad) < 1"> it is Sufficient to just consider the case where there is one wrong atom abad- 
Therefore, we consider dictionaries A with k + 1 atoms, with k = Card [Q*]. Evaluating ERC(A, Q*), 
ERC-OMP(A, Q*, Q) and ERC-OLS(A, Q*, Q) amounts to computing Fg*(abad), ^g*^g (abad) and 
FQ^"g(abad) and to testing whether they are lower than 1. 
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(c) FgMP(a^^^) FOL|(abad). 

Fig. 1. Comparison of the OMP and OLS exact recovery conditions. We draw 10.000 Gaussian dictionaries of size 100 x 11 and 
set fc = 10 so that there is only one wrong atom Cbad- Q is always set to the first atom (Card [Q] — 1). Plot of (a) Fq* (obad) 
^'■s -Ps*^s(abad); (b) Fq* (cbad) VI- F§*LSj(abad); (c) FQ*^Q(abad) v.? F§PQ(abad). For the last subfigure, we keep the trials 
for which Fq* (obad) ^ 1. 



Fig. [T]is a scatter plot of the three criteria for 10.000 Gaussian dictionaries A of size 100 x 1 1, where the 
elements of A are drawn according to an i.i.d. Gaussian distribution. The subset Q = {1} is systematically 
chosen as the first atom of A. Figs. [na,b) are in good agreement with Lemma |2j we verify that 
Fg^^g (ttbad) ^ Fq* (ttbad) whether ERC holds or not, and that -Fq? g(abad) ^ Fq* (ctbad) systematically 
occurs only when FQ*(abad) < 1- On Fig. [He) displaying FQ^Q'(abad) versus F^,^^(abad), we only 
keep the trials for which -Fq. (flbad) ^ 1> ERC(A, Q*) does not hold. Since both south-east and 
north-west quarter planes are populated, we conclude that neither OMP nor OLS is uniformly better. To 
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BRC-OMP(A,Q*) 




4 6 8 10 12 14 



Fig. 2. Computation of the bad recovery condition BRC-OMP(A, Q*) for Gaussian dictionaries of various sizes (m, n). 1,000 
trials are performed for each size, and Q* is always set to the first two atoms (k — 2). The grey levels in the image correspond 
to the rate of guaranteed failure, i.e., the proportion of trials where BRC-OMP(A, Q*) holds. 



be more specific, when ERC-OMP(A, Q* , Q) holds but ERC-OLS(A, Q*, Q) does not, there exists an 
input y G span(AQ*) for which OLS selects Q = {1} and then a wrong atom in the first two iterations 
(Theorem |4l). On the contrary, OMP is guaranteed perform an exact recovery with this input according 
to Theorem [3] The same situation can occur when inverting the roles of OMP and OLS according to 
Corollary |2] and Theorem [3] 

We have compared ERC-OMP(A, Q*, 1) and ERC-OLS(A, Q*, 1) which take into account all the 
possible subsets of Q* of cardinality 1. Again, we found that when ERC(A, Q*) is not met, it can occur 
that ERC-OMP(A, Q*, 1) holds while ERC-OLS(A, Q*, 1) does not and vice versa. 

Note that this analysis becomes more complex when Card [Q] ^ 2 since ERC-OMP(A, Q*, Q) alone 
is not a necessary condition for OMP anymore (Theorem |5] also involves the assumption that Q is 
reachable). 

B. Discrimination at the second iteration 

Because the above simulation cannot discriminate OMP and OLS, we consider the bad recovery 
condition BRC-OMP(A, Q*) under which OMP is guaranteed to fail when k iterations are performed. 
Meanwhile, OLS recovers Q* at least for some input in span(Ag*). Moreover, the k-t\\ iteration of OLS 
is guaranteed to succeed provided that the first k — 1 iterations have succeeded according to Theorem |6l 

We compute BRC-OMP(A, Q*) in the case k = 2 for various dictionary sizes (see Fig. |2]). We 
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perform 1,000 trials per size (m, n) in which the elements of A are drawn according to an i.i.d. Gaussian 
distribution and Q* is always set to the first two atoms. We notice that BRC-OMP(A, Q*) may frequently 
be met for overcomplete dictionaries, especially when in is low and 7i S> m. Because k = 2, OLS 
performs at least as good as OMP: when the first iteration (common to both algorithms) has succeeded, 
OLS cannot fail according to Theorem |6] while OMP is guaranteed to fail in cases where the BRC holds. 

This simulation can naturally be extended to the case k > 2 but the conclusions differ. OLS is not 
guaranteed to outperform OMP for any y G span(Ag*), but when BRC-OMP(A, Q*) is not met, OLS 
recovers Q* for some inputs while OMP cannot for any input. 

V. Conclusions 

Our first contribution is an original analysis of OLS based on the extension of Tropp's ERC condition. 
We showed that when ERC holds, OLS is guaranteed to yield an exact support recovery. Although OLS 
has been acknowledged in several communities for two decades, such a theoretical analysis was lacking. 
Our second contribution is a parallel study of OMP and OLS when a number of iterations have been 
performed and true atoms have been selected. We found that neither OMP nor OLS is uniformly better. 
In particular, we showed using simulated dictionaries that when the ERC is not met but the first iteration 
(which is common to OMP and OLS) selects a true atom, there are counter-examples for which OMP is 
guaranteed to yield an exact support recovery while OLS does not, and vice versa. 

Finally, a few elements of analysis suggest that OLS behaves better than OMP. First, any subset Q can 
be reached by OLS using some input in span(Ag) while for some dictionaries, it may occur that some 
subsets are never reached by OMP for any y G R"*. In other words, OLS has a stronger capabiUty of 
exploration. Secondly, when all true atoms except one have been found by OLS and no wrong selection 
occurred, OLS is guaranteed to find the last true atom in the following iteration while OMP may fail. 

For realistic problems where the data are noisy and the dictionary is far from orthogonal, empirical 
studies report that OLS usually outperforms OMP for a larger numerical cost [9,11]. In our experience, 
OLS yields a residual error which may be by far lower than that of OMP after the same number of 
iterations [15]. Moreover, it performs better support recoveries in terms of ratio between the number of 
good detections and of false alarms [16]. We believe that the reason why our exact recovery analysis 
does not corroborate this trend is that it is essentially based on a worst case analysis. An interesting 
perspective will consist of a theoretical study in the average case in order to evaluate more thoroughly 
the difference between OMP and OLS. 
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Appendix A 

Necessary and sufficient conditions of exact recovery for OMP and OLS 
This appendix includes the complete analysis of our OMP and OLS recovery conditions. 

A. Sufficient conditions 

We show that when Oxx happens to select true atoms during its early iterations, it is guaranteed to 
recover the whole unknown support in the subsequent iterations when the ERC-Oxx(A, Q*, Q) condition 
is fulfilled. We establish Theorem |3] whose direct consequence is Theorem|2]stating that when ERC( A, Q*) 
holds, OLS is guaranteed to succeed. 

1) ERC-Oxx are sufficient recovery conditions at a given iteration: We follow the analysis of [1, The- 
orem 3.1] to extend Tropp's exact recovery condition to a sufficient condition dedicated to the (j + l)-th 
iteration of Oxx. 

Lemma 4 Assume that Ag* is full rank. If Oxx with y € span( Aq* ) as input selects j true atoms 
Q Q* and ERC-Oxx(A, Q* , Q) holds, then the (j + l)-th iteration of Oxx selects a true atom. 

Proof: According to the selection rule ©-dD, Oxx selects a true atom at iteration + if and 
only if 

4>KrQ) = 17 ^ < 1- (12) 

Let us gather the vectors Cj indexed by i ^ Q* and i G Q*\Q in two matrices Cbad and Cq*\q of 
dimensions m x [n — k) and m x [k — j), respectively. The condition (fT2l ) rereads: 



\\CUrQ\\ 



Following Tropp's analysis, we re-arrange the vector rg occurring in the numerator. Since rg = PqV 

and y G span(Ag*), vq G span(Ag.\g) = span(C'g.\Q). We rewrite vq as PQ*\QrQ where Pq*\q 
stands for the orthogonal projection on span(C'Q*\Q): -Pq*\q = Pq*\q = Q'-\Q^'^Q*\Qf ■ 4>{t'q) 
rereads _ _ ^ _ 



This expression can obviously be majorized using the matrix norm: 



C'bad) ||oo,oo- 

(13) 
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Since the ioo norm of a matrix is equal to the ii norm of its transpose and || . ||i i equals the maximum 
column sum of the absolute value of its argument [1, Theorem 3.1], the upper bound of (fT3l) rereads 

l|C'o.\oC'bad||i,i = max II ^Cbad 111 = maxF§fo(abad) 

^ Chad ^ Obad 

according to Lemma [T] 

By definition of ERC-Oxx(A, Q* , Q), this upper bound is lower than 1 thus (^(t'q) < 1. According 
to (fT2l ). Oxx selects a true atom. ■ 

2) Recursive expression of the ERC-Oxx formulas: We elaborate recursive expressions of Fgf g(abad) 
when Q is increased by one element resulting in the new subset Q! C Q* (here, we do not consider the 
case where Q! = Q* since -Fgfg*(abad) is not properly defined, (ID) and ^ being empty sums). We will 
use the notations Q' = Q U {incw} where inow G Q*\Q and anew — <^i„ew- ^o avoid any confusion, 
will be systematically replaced by af- and af- to express the dependence upon Q and Q', respectively. 
In the same way, bi will be replaced by bf- or bp but for simplicity, we will keep the matrix notations 
Bq*\^q and -Bq*\q' without superscript, ~ referring to Q and Q! , respectively. 

Let us first link bf to bf when af / 0. 

Lemma 5 Assume that Aq' is full rank and Q' = QU {incw}- Then, span(AQ)"'" is the orthogonal 
direct sum of the subspaces span( Aq/)-*- and span(a^^), and the normalized projection of any atom 
Oj ^ span(AQ/) takes the form: 

bf = V?'^'b?'+X?'^'bg^ (14) 

where 

hrW 

xf''^' = {bf,bgj, (16) 

Proof: Since Q C Q', we have span(AQ/)^ C span(AQ)^. Because Aq> is full rank, span(Ag/)-'- 
and span( Ag)^ are of consecutive dimensions. Moreover, a^^^ = anew — -Pqo^iicw £ span( Ag/) n 
span( Ag)^, and d^cw / since Ag/ is full rank. As a vector of span( Ag/), a^^^ is orthogonal to 
span(Ag')^. It follows that span(d^p^) is the orthogonal complement of span( Ag-)^ in span( Ag)^. 
The orthogonal decomposition of di = PqU^ reads: 

"i "i \"i ) "new /"new 
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since b^^w unitary. Replacing df- = \\df-\\b^ and = ||dp || fo^^ yields ([T4l)-([T6l). Pythagoras' 
theorem yields (fTTl) . The assumption ^ span( Aq') implies that dp / 0, then rjf"^ > 0. ■ 

Lemma 6 A^^Mme f/iaf Ag* is full rank. Let Q' Q* with Q' = QU {«ncw}- r/ien, span(Sg*\^g) 
/5 the orthogonal direct sum o/ span(Bg*\g/) and span(6^^). 

Proof: According to Corollary [8] in Appendix IbI Bq*\q and Bq*\qi are full rank matrices, thus their 
column spans are of consecutive cardinalities. Lemma [5] states that b^^w orthogonal to span(Ag')-'-, 
thus it is orthogonal to fop G span(Ag')-'- for all i G Q*\Q'. ■ 
In the following lemma, we establish a link between FQ,^g(abad) and -FQfg/(abad)- It is a simple 
recursive relation in the case of OMR For OLS, we cannot directly relate the two quantities but we 
express Fg,L|(abad) = ||^Q*\Q^badlli with respect to -B^.^g^fog^^. 

Lemma 7 Assume that Aq* is full rank. Let Q ^ Q' ^ Q* with Q' = QU {incw}- When atad ^ 
span( Aq'), 

i^g^^n (flbad) = i^§^l"'(abad) + I (Aj^.abad) (W)| (18) 



•Q*,Q V"bad; — -'■Q*,Q'V"bad; -r \\^Q 
pOLS 

'Q*,QV"bad7 - Abad ~ '/bad 

ieQ*\Q' 



Q,S' _ Q,S' 'Sp Pbad i^Mj 
-^bad ''bad Z-^ Q,Q' 



where r/p'®' ancf x,^'® are defined in ([T5l)-([T6l) W P^l}^ = -Bg,^g,fobad- 

Proof: ([TSl l straightforwardly follows from the definition (01) of FQ?g'(abad)- 
Let us now establish ( fT9l ). We denote by -Pq*\q and -Pq*\q' the orthogonal projectors on span(SQ*\^g) 
and span(SQ*\g/). Because span(Sg*\g) is the orthogonal direct sum of span(Sg*\Q/) and span(bpg^) 
(Lemma [6]), we have the orthogonal decomposition: 

^S*\S^bad = ^S*\S'^bad + Xba^ ^new 

(0 yields 

^S*\S°bad - ^bad ^Q*\S'°bad + ^bad "new 

(-^Q*\Q'^S;w = according to LemmaO and then 

ieQ*\Q' 
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by definition of P^^}^ ■ In the latter equation, we re-express bp with respect to bf- using (fT4l) : 
p^.lQ V ^bad^ (0 rs f ,.S.Q' ^S.g' V ^bad"^ (^)xf 1 rg 

-^Q*\Q"bad - ^bad q;Q' "i + S Xbad ^bad q;q^ f"new 

ieQ*\Q' I iGQ*\Q' J 

We conclude that FgL|(abad) = ||-^Q*\Q^badlli ^eads (O. ■ 

3 ) ERC is a sufficient recovery condition for OLS: The key result of Lemma [2] (see Section IIII-DI ) 
states that FQ^'g(abad) is decreasing when Q C Q* is growing provided that Fgf g(abad) < 1> and that 
-^Q*^s(^bad) is always decreasing. 

Proof of Lemma ^ It is sufficient to prove the result when Card [Q'] = Card[Q] + 1. The case 
Card [Q'] > Card [Q] + 1 obviously deduces from the former case by recursion. 

Let Q £ Q' C Q* with Card [Q'] = Card [Q] + 1. The result is obvious when Obad £ span(Ag'): 
^bad = then -Fgf g, (otbad) = 0. When abad ^ span( Ag/), dS) obviously deduces from ( fTSl ). The proof 
of ^ relies on the study of function ip{r]) = — r]'^ — C7]\ + Dr] which is fully defined in (l25l) . (l26l) 
and ( [27l ) in Appendix |Cl Because this study is rather technical, we place it in Appendix [C] 

We notice that FQi^Q(abad) given in ( fT9l ) takes the form ^{v^i^^ ) where the variables occurring 
in C and D (see ^ and ^) are set to ^ Card [Q*\Q'], r/j ^ r^f'®', Xi ^ and /3 ^ 

sgn(Xbaf' )/3fad^'- Now, we invoke Lemma [H in Appendix O as F§,L|,(abad) = ||/3bad^'||i Plays 
the role of \\(3\\i, FOL|(abad) < 1 implies that F§}^^Q,{ak^) ^ FOL|(abad). ■ 

We deduce from Lemmas |2] and |4] that ERC-Oxx(A, Q*, Q) are sufficient recovery conditions when 
Q C Q* has been reached (Theorem [3]). 

Proof of Theorem |2} Apply Lemma |4] at each iteration j, . . . , A; — 1 until the increased subset Q' 
matches Q*. The ERC-Oxx(A, Q*, . ) assumption of Lemma|4]is always fulfilled according to Lemma[2l 

■ 

Finally, we prove that ERC(A, Q*) is a necessary and sufficient condition of successful recovery for 
OLS (Theorem ^. 

Proof of Theorem^ The sufficient condition is a special case of Theorem[3]for Q = 0. The necessary 
condition identifies with that of Theorem [2 since ERC-OLS(A, Q*, 0) simpUfies to ERC(A, Q*). ■ 

B. Necessary conditions 

We provide the technical analysis to prove that ERC-Oxx(A, Q*, Q) is not only a sufficient condition 
of exact recovery in the worst case when Q C Q* has been reached, but also a necessary condition. We 
will prove Theorems |4] and |5] (see Section HID) generalizing Tropp's necessary condition [1, Theorem 3.10] 
to any iteration of OMP and OLS. 
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We proceed in two stages. In the first stage, we assume tliat Oxx exactly recovers Q C Q* in j = 
Card [Q] iterations with some input vector in span(AQ). This reachabiUty assumption allows us to carry 
out a parallel analysis of OMP and OLS (subsection lA-Bll) leading to the following proposition. 

Proposition 1 [Necessary condition for Oxx after j iterations] Assume that Aq* is full rank and 
Q $i is reachable from an input in span( Ag) by Oxx. If ERC-Oxx(A, Q*, Q) does not hold, then 
there exists y G span( Ag.) for which Oxx selects Q in the first j iterations and then a wrong atom 
o-had in the (j + l)-th iteration. 

This proposition coincides with Theorem |5] in the case of OMP whereas for OLS, Theorem |4] does not 
require the assumption that Q is reachable. 

The second stage investigates whether the reachabiUty assumption is automatically fulfilled or not (see 
subsections IA-B2I and IA-B3I for OLS and OMP, respectively). 

1) Parallel analysis of OMP and OLS: Proof of PropositionU} We proceed the proof of Lemma |4] 
backwards. By assumption, the right hand-side of inequality ([T3] ) is equal to 

II (C'g.^gC'bad) lloo.oo = maxFgf g(abad) ^ 1- 
By definition of induced norms, there exists a vector v G ]R!^~^ satisfying v and 

ll(^S*\Q^bad) v\\oo _ ^ . 



(C'g*\ gCbad) ||oO,00 ^ 1- (20) 



lull ■■ ^ '2*\Q 

il "^lloo 

Define 

y = AQ.\Q{CQ.\QAQ.\Qy^V. (21) 

The matrix inversion in (|2TI ) is well defined since Ag*\g is full rank (Corollary [3] in Appendix iBt and 
^Q*\Q — -Aq*\q or Bq*\q reads as the right product of Ag.\g with a nondegenerate diagonal matrix. 
By taking into account that Ag*\g = PqAq*\q, we obtain that 

V = &Q,^QP^y. (22) 

Since the left hand-side of ( [20l ) identifies with 4>{PQy) where is defined in (fT2l) . (|20l ) yields: 

max I (PQy,Ci) I ^ max \{PQy,Ci)\. (23) 



Moreover, we have -Pg y / according to (122] ) and 7^ 0. 

Now, let z G span(Ag) denote the input for which Oxx recovers Q. According to Lemma [15] in 
Appendix |D] the first j iterations of Oxx with the modified input y = z + ey also select Q when e > 
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is sufficiently small. Because PqV = ^PqV and (l23l) holds, the {j + l)-th iteration of Oxx necessarily 
selects a wrong atom. ■ 
At this point, we have proved Theorem [5] which is relative to OMR 

2 ) OLS ability to reach any subset: In order to prove Theorem |4l we establish that any subset Q can 
be reached using OLS with some input y G span(AQ) (Lemma [3])- To generate y, we assign decreasing 
weight coefficients to the atoms {aj, i G Q} with a rate of decrease which is high enough. 

Proof of Lemma |21" Without loss of generality, we assume that the elements of Q correspond to the 
first j atoms. 

Firstly, we define the vectors {i^i, . . . , Vj] resulting from the orthogonalization of {ai, . . . , aj}: for 
all i ^ j, we have span(ai, . . . , a-i) = span(i>i, . . . , v-i) where vi = ai and for i > 1, Vi is set to the 
orthogonal projection of aj onto span(ai, . . . , aj_i)^. 

Secondly, for arbitrary values of £2, . . . ,ej > 0, we define the following recursive construction: 

• yi = ^'1, 

. yi = yi-i + EiVi for i e {2,... , j}. 
(yi implicitly depends on 82, ■ ■ ■ ,£1) and set y = yj. We show by recursion that there exist 82, ■ ■ ■ > 
such that OLS with yi as input successively selects ai, . . . ,aj during the first i iterations (in particular, 
the selection rule ^ always yields a unique maximum). 

The statement is obviously true for yi = ai. Assume that it is true for yi^i with some 82, ■ ■ ■ , > 
(these parameters will remain fixed in the following). According to Lemma [TS] in Appendix|Dj there exists 
Ei > such that OLS with yi = y-i-i + e-iVi as input selects the same atoms as with yj_i during the 
first i — 1 iterations, i.e., ai, . . . , aj_i are successively chosen. At iteration i, the current active set thus 
reads Q' = {l,...,z — 1} and the OLS residual corresponding to yi takes the form 

rQ' = PQ'yi~l + EiPQ.Vi = EiVi 

since yi^i G span(AQ') and Vi G span( Aq')-*-. By construction, Vi is equal to af- = PQ,ai, thus vq' 
is proportional to ap and then to bp . Finally, the OLS criterion Q is maximum for the atom and 
the maximum value is equal to \{rQ',bf- )| = \\rQ'\\ since bp is of unit norm. 

Finally, we show that no other atom yields this maximum value. Apply Lemma [8] in Appendix |B] the 
full rankness of Ag/u{j^i} (as a family of less than spark(A) atoms) implies that [bp , bp ] is full rank, 
thus bp and bp cannot be colinear. ■ 

Using Lemma [3l Proposition [T] simplifies to Theorem |4] in which the assumption that Q is reachable 
by OLS is omitted. 
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Fig. 3. Example[T] drawing of the plane span(ai)^. The tilde notation refers to the subset Q = {1}. When 6i is close to 0, 
0,2 is of very small norm since a2 is almost equal to ai, while as and 04, which are almost orthogonal to ai, yield projections 
as and 04 that are almost of unit norm. The angles (02, 03) and {0,2, 0.4) tend to 62 and —82 when di — ^ 0. The bullet and 
square points correspond to positions r satisfying |(r, 02)! |(r, da)! K'^'i ^2)! |(r, 04)], respectively. These two cones 
only intersect at r = 0, therefore OMP cannot successively select ai and 02 in the first two iterations. 

3) OMP inability to reach some subsets: Contrary to OLS, OMP may not reach some subsets as stated 
in Example [T] in Section |llll We now prove this result. 

Proof of Example Wl Assume that OMP selects a true atom in the first iteration. Because there is 
a symmetry between ai and 02, we can assume without loss of generality that ai is selected. We show 
that 03 or a4 is necessarily selected in the second iteration. 

As the atom dimension is m = 3, the residual rji} lies in span(ai)-'- which is of dimension 2. The 
simple projection calculation di = a^ — {ai, ai)ai (the tilde notation implicitly refers to Q = {1}) leads 
to: 





sin 61 




sin 9i cos 9i cos 62 




sin 61 cos 9i cos 02 


0.2 = sin(20i) 


cos 9i 


, as = 


cos^ 61 cos 62 


and 04 = 


cos^ 01 cos 02 









sin 62 




— sin 02 



It is noticeable that when 0i is close to 0, ||a2|| = | sin(20i)| is small while 03 and 04 are almost of 
unit norm, and the angles (02,03) and (02,04) tend to 02 and —02 when ^1 — >• (see Fig. [3] for a 2D 
display in the plane span(oi )-*-). 

It is easy to check that the set of points r G satisfying |(r, 02)] ^ |(r, 03)! is a 2D cone centered 
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around the direction that is orthogonal to 03 (dashed Une in the south-east and north-west directions 
in Fig. [21). Specifically, both plain lines delimiting this cone are orthogonal to + 02 and d-s — 02. 
Similarly, the set of points r G satisfying \{r,d2)\ ^ \{r,d4)\ is another 2D cone centered around 
the direction that is orthogonal to d^. When 9i is close to 0, both 2D cones only intersect at r = 
(since their inner angle tends towards 0), thus 

Vr G R^{0}, \{r,d2)\ < max(|(r, dg)!, Kr, 04)!). 

We conclude that a2 cannot be selected in the second iteration according to the OMP rule ([T]). ■ 

Appendix B 
Re-expression of the ERC-Oxx formulas 

In this appendix, we prove Lemma [J by successively re-expressing AQ^,^gdbad and -Bg^^gbbad- Let 
us first show that when Aq* is full rank, the matrices Aq*\q and Bq<,\q are full rank. This result is 
stated below as a corollary of Lemma (8] 

Lemma 8 If QCi Q' = 9 and Aq\jq> is full rank, then Aq, and Bq, are full rank. 

Proof: To prove that Aq, is full rank, we assume that J2j£Q' ^j^f — ^ ^i^h Uj G R. By definition 
of d^ = PqCIj = aj — PqUj, it follows that XljeQ' '^j^j ^ span(AQ). Since Aqijq> is full rank, we 
conclude that all aj's are 0. 

The full rankness of Bq, directly follows from that of Aq, since for all i £ Q', bf- = df-/\\df-\\ is 
colinear to df-. ■ 
The direct application of Lemma [8] to our context with Q' = Q*\Q leads to the following corollary. 

Corollary 3 Assume that Aq* is full rank. For Q C Q* Ag*yg and Bq*\q are full rank. 

Lemma 9 Assume that Aq* is full rank. For Q C Q*, Ag^^gdbad = (^Q*'^bad) |(q*\^q) where \ denotes 
the restriction of a vector to a subset of its coefficients. 

Proof: The orthogonal decomposition of abad on span( Aq* ) takes the form: 

Ctbad = ^Q* (^Q*abad) + -Pg^flbad- 

Projecting onto span(AQ)-'-, we obtain 

dhad = ^S*\s(^Q*"bad) |(g.\^Q) + -Pg.ttbad (24) 
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{PqPq, = Pq^ because span(AQ*)-'- C span(Ag)-'-). For i G Q*\Q, = aj - PgOj G span(Ag*). 
Thus, we have span(Ag*\Q) C span(Ag*), and Pg^abad is orthogonal to span(AQ*\Q). According to 
Corollary |3j Ag.^g is full rank. It follows from (l24b that A^^^gabad = (^Q^^bad) kq^^q)- ■ 

Lemma 10 Assume that Aq* is full rank. For Q C Q*, 

llabadll -B^.^gfebad = (A^g.abad)|(g*\Q) 

where stands for the diagonal matrix whose diagonal elements are {||ai||, i G Q*\Q}- 

Proof: The result directly follows from abad = ||ttbad|| ^bad> = aj/||at|| for i G Q*\Q, and from 
Lemma |9] ■ 
Proof of Lemma U} The result is obvious when abad = 0- It follows from Lemmas |9] and [TOl when 

flbad / 0. ■ 

Appendix C 

Technical results needed for the proof of Lemma [2] 
With simplified notations, the expression ([T9l ) of -FgJ"g(abad) reads 



tpiv) = - - C7]\ + Dr] (25) 
where r] G (0, 1] and C and D take the form 

C = y^ (26) 

L> = ^ (27) 

with iV ;s 1, /3 = . . . ,/3Ar] G R^, and for all i, r]i G (0, 1] and Xi G [-1, 1] satisfy r/f + x] = 1- 
Note that we can freely assume from (O that Xba^ = 1 - (^S<f ^ ^- When Xba^ < 0, one 
just needs to replace abad by — abad> leading to the replacement of f3 by —fB in (l26l ) and (l27l ). 

The succession of small lemmas hereafter aims at minorizing ip{r]) for arbitrary values of r], r]i, Xi 
and p. They lead to the main minoration result of Lemma [T4l 

Lemma 11 Let (3 e R^. 

IfC ^ 0, Vr? G [0, 1], V9(r?) 1 + - 1)??. (28) 



//C>0, mill (^(t?) = min 1,Z)/V1 + C2 . (29) 

7;e[0,l] 
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Proof: We first study the function /(r?) = ^/l-r]"^ - Cr]. We liave /(O) = 1, /(I) = -C, and / 
is concave on [0, 1]. To minorize Lp{ri) = |/(?7)| + Drj, we distinguish two cases depending on the sign 
of C. 

When C ^ 0, /{rj) ^ for all r]. Since |/| = / is concave, it can be minorized by the secant line 
joining /(O) and /(I), therefore, ^ 1 - (C + ^ 1 - r/. (EUl follows from (p{r]) = \f{i])\ + Dr] 

and D ^ ||/3||i (because r]i are all in (0, 1]). 

When C > 0, /(r?) ^ for G [0, z] and < in {z, 1], with z = l/Vl + C^- I? ^ and /(z) = 
imply that for r] > z, ip{r]) ^ thus the minimum of if is reached for t] G [0,z]. On [0,2;], 

ip{r]) = f{r]) + L)r/ is concave, therefore the minimum value is either 99(0) = 1 or ip{z) = Dz. ■ 
The following two lemmas are simple inequalities linking C, D, and ||/3||i. 



Lemma 12 G R^, Z?^ _ (^2 ^ p||2_ 

Proof: By developing and from (l26l ) and (|27] ). we get 



ViVj 



Since Vi, r/,f + = 1, we have: 



^^^^ 



1 - CTjcrjXiXj 
ViVj 



(30) 



with Ui = sgn(/3i) = ±1 if /3j / 0, and Uj = 1 otherwise. Because ryj and Xi satisfy 77? + = 1, they 
reread rji = cos 6i and Xi = sin^j, so rjirjj + cncrjXiXj = cos{9i it ^j) ^ 1 which proves that the last 
bracketed expression in ^ is non-negative. Finally, ^ yields _ (;2 ^ ||^||2 , 

Lemma 13 V/3 G R^, ||/3||i ^ 1 /m/j/Ze^ that ||/3||i ^ D/Vl + C^. 

Proof: (1 + ^ + according to Lemma [E] ■ 

We can now establish the main lemma that will enable us to conclude that if ^^^'^(abad) < 1> 
Fg?Q,(abad) is monotonically nonincreasing when Q' ^ Q is growing. 



Lemma 14 V/3 G R^, Vt? G [0, 1], 99(77) < 1 implies that ^ 99(7?). 
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Proof: Apply Lemma [TT] 

When C ^ 0, (EHJl and Lp{ri) < 1 imply that (||/3||i - 1) < 0. Since rj ^ 1, the lower bound of dHJl is 
larger than l + (||/3||i-l) = \\(3\\i. 

When C > 0, (1291 ) and (f{ri) < 1 imply that the minimum value of (p on [0, 1] is D/VTTc^ < 1, 
then _ (72 ^ ^_ Lemmas [H] and [I3] imply that ||/3||i ^ 1 and then \\(3\\i s$ D/VTTC^ ^ (^(??). ■ 

Appendix D 

Behavior of Oxx when the input vector is slightly modified 

Lemma 15 Let yi and 2/2 G K.*"- Assume that the selection rule ©-(El) of Oxx with yi as input is strict 
in the first j > iterations (the maximizer is unique). Then, when e > Q is sufficiently small, Oxx selects 
the same atoms with y{e) = yi + £y2 as with yi in the first j iterations. 

Proof: We show by recursion that there exists > such that the first / iterations of Oxx (/ = 
1, . . . , j) with y{e) and yi as inputs yield the same atoms whenever e < e;. 

Let 1^1. We denote by Q the subset of cardinality / — 1 delivered by Oxx with yi as input after 
/ — 1 iterations. By assumption, Q is also yielded with y{e) when e < e;_i. Since y{e) = yi + ey2, the 
Oxx residual takes the form vq = ri + er2 where tq, ri and r2 are obtained by projecting y{e), yi, 
and t/2> respectively onto span(AQ)-'-. Hence, for i ^ Q, 

{rQ,Ci) = {ri,Ci) + e{r2,Ci). (31) 

Let ttjiew denote the new atom selected by Oxx in the Z-th iteration with yi as input and let inew refer 
to the corresponding index in the dictionary. By assumption, the atom selection is strict, i.e., 

Kn,Cncw)| > max \ {ri,Ci)\. (32) 

Taking the limit of (ISTT ) when e — )• 0, we obtain that for any i, \{rQ,Ci)\ tends toward \{ri,Ci)\. (l32l) 
implies that when e < ei^i is sufficiently small, 

\{rQ,Cncw)\ > max \ {rQ,Ci)\ 

by continuity of \{rQ,Ci)\ (i / incw) and \ {rQ, c^cw)] with respect to e. Thus, Oxx with y{e) as input 
selects ttncw in the l-th iteration. ■ 
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